Scientific Python antipatterns advent calendar day nine

For today, a few points about designing functions. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Using special values where they are not necessary

As soon as we start writing functions, we have to start thinking about how they will behave in various different circumstances, including ones that appear to require special logic. Imagine that we have a function that takes a word and counts the number of times it contains the letter a:

def count_a(word):
    count = word.lower().count('a') # case insensitive
    return count

count_a('Apple'), count_a('banana')
(1, 3)

No surprises there. But when beginners start to write functions, they often add in extra bits of logic to deal with special cases - for example, if the input does not contain any as:

def count_a(word):
    if 'a' not in word.lower():
        return 'no a characters!'
    else:
        count = word.lower().count('a') 
        return count

count_a('Apple'), count_a('banana'), count_a('kiwi')
(1, 3, 'no a characters!')

This is a terrible idea. Not only is the code harder to read, but we now have a function that sometimes retuns an integer, and sometimes returns a string. So it’s very hard to write code that uses the output of the function. Here we will try to use the output of the function to find fruits that have more than 2 a characters:

for fruit in ['apple', 'banana', 'kiwi']:
    if count_a(fruit) > 2:
        print(fruit)
banana
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 2
      1 for fruit in ['apple', 'banana', 'kiwi']:
----> 2     if count_a(fruit) > 2:
      3         print(fruit)

TypeError: '>' not supported between instances of 'str' and 'int'

but our loop crashes when the function returns a string.

We can improve matters a bit by making sure that our function returns an integer:

def count_a(word):
    if 'a' not in word.lower():
        return -1
    else:
        count = word.lower().count('a') 
        return count

count_a('Apple'), count_a('banana'), count_a('kiwi')
(1, 3, -1)

which allows the loop to run without issue:

for fruit in ['apple', 'banana', 'kiwi']:
    if count_a(fruit) > 2:
        print(fruit)
banana

But the special return value is totally unnecessary - we already have a perfectly good way of representing a word that doesn’t contain any a characters: the number zero. So if we get rid of the special logic entirely:

def count_a(word):
    count = word.lower().count('a') # case insensitive
    return count

count_a('Apple'), count_a('banana'), count_a('kiwi')
(1, 3, 0)

the code is simpler, and our loop still works perfectly well:

for fruit in ['apple', 'banana', 'kiwi']:
    if count_a(fruit) > 2:
        print(fruit)
banana

A similar situation where beginners run into difficulty is in designing functions that have a true/false answer. Imagine we change our function so that it will simply tell us whether a word has at least one a character:

def check_a(word):
    if word.lower().count('a') == 0:
        return 'no'
    else:
        return 'yes'

check_a('Apple'), check_a('banana'), check_a('kiwi')
('yes', 'yes', 'no')

This works, but we now have to remember when using the function that it returns the strings 'yes' and 'no'. If we get it even slightly wrong it will not work:

for fruit in ['apple', 'banana', 'kiwi']:

    # wrong capitalisation
    if check_a(fruit) == 'Yes':
        print(fruit)

Once we start down this road, it’s very hard to stay consistent. There are many options that we might end up using: 'yes'/'no' or 'true'/'false' or 0/1 or 1/-1.

The correct thing to do in Python is to use the built-in values True and False for this purpose - note that they are written without quotes as they are not strings!

def check_a(word):
    if word.lower().count('a') == 0:
        return False
    else:
        return True

check_a('Apple'), check_a('banana'), check_a('kiwi')        
(True, True, False)

Even more Pythonic is to return the expression directly and skip the if/else bit. Note that we have to switch the comparison around to make sure that we testing for presence of as - it’s easy to make a mistake here:

def check_a(word):
    return word.lower().count('a') > 0

check_a('Apple'), check_a('banana'), check_a('kiwi')        
(True, True, False)

Now the function is beautifully clear, and the loop code can be simplified as well - no need for explicit comparison:

for fruit in ['apple', 'banana', 'kiwi']:
    if check_a(fruit):
        print(fruit)
apple
banana

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list